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Abstract 


We propose a new method for PolSAR (Polarimetric Synthetic Aperture Radar) 
imagery classification based on stochastic distances in the space of random ma¬ 
trices obeying complex Wishart distributions. Given a collection of prototypes 
and a stochastic distance d [.,.), we classify any random matrix X us¬ 
ing two criteria in an iterative setup. Firstly, we associate X to the class which 
minimizes the weighted stochastic distance Wf„d(X,Zm), where the positive 
weights Wm are computed to maximize the class discrimination power. Sec¬ 
ondly, we improve the result by embedding the classification problem into a 
diffusion-reaction partial differential system where the diffusion term smooths 
the patches within the image, and the reaction term tends to move the pixel val¬ 
ues towards the closest class prototype. In particular, the method inherits the 
benefits of speckle reduction by diffusion-like methods. Results on synthetic 
and real PolSAR data show the performance of the method. 
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1 Introduction 

Classification is one of the most important techniques for image analysis. It aims at 
mapping each pixel into a class, so it transforms values into information. 

Classification can be performed using a variety of sources, mostly the spectral 
information (the observed value in each pixel), spatial or contextual information, 
and ancillary data (ground truth, for instance). The latter is usually only available in 
very restricted areas, from which training samples can be obtained. 

The simplest available classification techniques rely only on prxel-wise informa¬ 
tion, i.e., on the observation in each coordinate: Isodata, Parallelepiped and Point- 
wise Maximum Likelihood are examples of these methods. Arguably, the most suc¬ 
cessful techniques exploit both the spectral information and the context. This is 
mostly due to the fact that images exhibit a great deal of spatial redundancy, i.e., 
spatially neighboring pixels tend to be alike. 

As an example of contextual classification one should mention techniques based 
on Markovian models. Geman and Geman (1) posed the classification process as an 
estimation problem and, as such, proposed a number of estimators and algorithms. 
These techniques rely on variations of the following idea: the class m in each coordi¬ 
nate should satisfy a criterion that, simultaneously, optimizes the pointwise spectral 
evidence and a contextual measure of smoothness, for example maximizing 

;■)) -I- (1 - X)N[m,dij], (1) 

*CTIM. Dpto. de Ingenieria ElectronicayAutomatica, Universidad de Las Palmas de G.C., Campus de 
Tafira, 35017, Spain, luis.gomez@ulpgc.es 

^CTIM. Dpto. de Informatica y Sistemas, Universidad de Las Palmas de G.C., Campus de Tafira, 35017, 
Spain, lalvarez@ctim.es 

^lmazorra@ctim.es 

^Laboratorio de Computagao Cientifica e Analise Numerica, Universidade Federal de Alagoas, Av. 
Lourival Melo Mota, s/n, 57072-900 Maceio - AL, Brazil, acfrery@gmail.com 


2 



where Ac [0,1] is the relative weight of the spectral evidence over the context, fm [z[i, j)) 
is the likelihood of the observation z{i,j] in coordinate {i,j) with respect to the 
model characterized by the probability density function fm, rn e M} is one 
of the M possible classes, and N{m,dij) is a nondecreasing function on the num¬ 
ber of neighbors of coordinate denoted by d,y, that have been classified as 
m. If A = 0, only the context is relevant and the most frequent class in the neigh¬ 
bours is the optimal solution. If A = 1, only the radiometric pointwise information 
is relevant, and the maximum likelihood estimator (assuming independent obser¬ 
vations) is the optimal solution. Specialized algorithms, such as the ICM (Iterated 
Conditional Modes), MPM (Maximum Posterior Modes), and Simulated Annealing 
are required to obtain (often approximate) solutions for intermediate values of A; cf. 
Ref. (3. Eq. (T) makes it clear that the context, in this approach, is incorporated by 
means of the neighbouring classes. 

The aim of this work is proposing a new classification procedure based on both 
context and radiometric information, but without the use of classes as descriptors of 
the former. For the latter, we tackle the problem of classifying Polarimetric Synthetic 
Aperture Radar (PolSAR) imagery. 

According to Ref. (3] , the early developments of imaging radar aimed at the char¬ 
acterization of aircraft targets. Spatial resolution was a fundamental issue with these 
images; it depends directly on the dimensions of the antenna, more precisely, on its 
aperture. Deployable antennas would yield unacceptable spatial resolutions of the 
order of hundreds of metres, even kilometres. Higher resolutions can be obtained 
by controlling the illumination, forming longer synthetic (as opposed to physical) 
antennas, yielding the broad area of SAR - Synthetic Aperture Radar. 

The main differences between SAR sensors and sensors that operate in the visi¬ 
ble spectrum are 

Wavelength: The former operate in the microwave section of the electromagnetic 
spectrum. 

Activeness: SAR sensors carry their own source of illumination. 

Coherence: SAR sensors are able to record the relative phase of the emitted and 
incidence signals. 

These characteristics lead to a number of interesting and, oftentimes, challeng¬ 
ing properties, among them: 

• SAR sensors are little affected by adverse weather conditions, such as fog, rain, 
smog, etc., and they provide images regardless the presence of daylight. 

• The return is mostly sensitive to the target dielectric properties, and to its ge¬ 
ometry at the wavelength scale, which typically ranges from millimeters to 
meters. 

• SAR images, after being processed, can be adequately described by a multi¬ 
plicative rather than additive model for the data. 
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Polarimetric SAR (PolSAR) extends the imaging ability of single SAR imaging. A 
PolSAR sensor transmits a linearly polarized signal and records two orthogonal po¬ 
larizations of the returned signal. Such multidimensional information allows deeper 
investigation of the scattering mechanisms of the targets under study. Fully PolSAR 
images record the four possible combinations of the signal according to the polar¬ 
ization of the transmitting and receiving antennas. The first practical fully PolSAR 
sensor, the L-band AIRSAR, was developed by the Jet Propulsion Laboratory (JPL) in 
1985. 

On the one hand, the deterministic approach has been used with success in the 
classification of PolSAR imagery, usually at the expense of using additional infor¬ 
mation. For instance, Shi et al. (4] perform Manifold Learning on layers of features 
derived from polarimetric decompositions, while Negri et al. |5] use neighbouring 
information and Support Vector Machines. 

On the other hand, statistical approaches able to cope with the departure from 
the usual Gaussian additive data formation provide an attractive framework for Pol¬ 
SAR image classification. Among others. Refs. |6l|7l[8] provide a comprehensive ac¬ 
count of the models, and of the relations among them, that have been used to de¬ 
scribe polarized and fully polarimetric SAR images. 

The classification of PolSAR imagery is an important research topic. Among the 
ones that use stochastic models for the fully polarimetric information, one should 
mention the pioneering work of Lee et al. (9) , who present the multivariate com¬ 
plex Wishart distribution and some of its properties for the Remote Sensing com¬ 
munity. Other techniques have been proposed as, for instance, the minimization of 
stochastic distances between segments of data and prototypes [lO], and the use of 
spectral and contextual information by means of the ICM algorithm under the Potts 
model as prior distribution |2] . These works use the data as available to the users, 
differently from another approach, namely the classification based on the HI a de¬ 
composition HD. It is noteworthy that the latter work presents a deterministic and 
unsupervised technique, while the others rely on the availability of prototypes and 
stochastic models. 

Diffusion-reaction equations have been successfully applied in many different 
contexts, e.g. Chemistry and Ecology. In the field of image processing, diffusion- 
reaction equations have been applied for instance in quantization [12], filtering and 
segmentation (T^, and in medical applications fT4l . 

The general shape of a diffusion-reaction partial differential equation is given by 

+ ( 2 ) 

where u{t, x, y) represents the solution of the equation at instant t and coordinates 
(x, y), if ( m) is the diffusion term given, typically, by an elliptic second order differen¬ 
tial operator, for instance the Laplacian operator. The reaction term is flu), whose 
shape strongly depends on the intended application. Roughly speaking, the reaction 
term tends to move the solution u[t,x, y) of the equation towards the nearest stable 
equilibrium state associated to the ordinary differential equation (ODE) u' - f{u) 
[Um is an stable equilibrium state of u' = f{u) if f{Um) - 0 and f'iUm) < 0). Usually, 
u{0,x,y) represents the observed data and u(t,x,y) represents its evolution under 
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the action of the differential equation where the diffusion and reaction terms are 
in competition. On the one hand, the diffusion term tends to smooth the solution 
taking into account the neighbors values and, on the other hand, the reaction term 
tends to move the solution towards some asymptotic values. 

In this work we address a difficult problem: the classification of PolSAR images. 
Our approach is based on weighted stochastic distances and a diffusion-reaction 
partial differential equations system which imposes a certain level of spatial smooth¬ 
ness. This work is an extension of Ref. dsi , where we proposed a diffusion-reaction 
system, but using the Euclidean distance to perform PolSAR image classification. 
We will show that the use of weighted stochastic distances, instead of the Euclidean 
distance, improves significantly the classification results. 

The rest of the article is organized as follows: Section deals with the statis¬ 
tical description of PolSAR data, including expressions of stochastic distances be¬ 
tween models. Sectionj^is devoted to the classification of these data using weighted 
stochastic distances between complex Wishart distributions. Section ideals with 
the discretization of the diffusion-reaction differential system. Section]^ presents 
results on the classification of simulated and real PolSAR data, and, finally, in Sec- 
tionlDsome conclusions are drawn. 


2 The Scaled Complex Wishart distrihution 

This section is based on the work by Nascimento et al. [16] . 

2.1 Characterization 

The polarimetric coherent information associates, in each frequency of operation, 
to each pixel a 2x2 complex matrix with entries Svv> Svh> Shv> and Shh. where Sij 
is the backscattered signal for the ith transmission and jth reception linear polar¬ 
ization, i, j = H, V. Under mild conditions, Shv = Syn. and the scattering matrix can 
be simplified into the three-component complex vector s - [Sw V^Svh Shh] . 
where ^ denotes vector transposition. This random vector can be modeled by the 
zero-mean multivariate complex Gaussian distribution [17]. 

Different targets are characterized by different variances and, thus, are hard to 
discern visually. In order make such difference noticeable, and to improve the signal- 
to-noise ratio, L ideally independent measurements of the same target are obtained 
while processing the raw data. This quantity is the number of looks. These observa¬ 
tions are used to produce the “multilook sample covariance matrix format” 

Z-lists*, 

^ i=\ 

where the superscript * represents the complex conjugate transpose of a vector, and 
Si, 1=1,2,..., L, are the L scattering vectors. Assuming that these vectors are inde¬ 
pendent, Z is an Hermitian positive definite matrix, and it follows a scaled complex 
Wishart distribution [18] . Having Z and L as parameters, the scaled complex Wishart 
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distribution is characterized by the following probability density function 

j^3Li^iL—3 

/zU:X,L)=^^^exp(-Ltr(2;-i2)), (3) 

where FalL) = :7r^n;=or(L- i] for L > 3, r(-) is the gamma function, tr(-) represents 
the trace operator, | ■ | denotes the determinant operator, Z is the covariance matrix 
associated to s, Z = E(ss*), where E(-) is the expectation. The first moment of Z 
satisfies E(Z) = Z. We denote Z ~ W[L,L) to indicate that Z follows the scaled 
complex Wishart distrihution. As can be seen in [19], this distribution is able to 
accommodate an arbitrary number of polarimetric components. 

2.2 Parameter estimation 

Let {Zi,Z 2 ,...,Zisr} be a random sample from Z ~ W{1.,L). The ML estimators for 
Z and L, namely Zml and Lml> respectively, are quantities that maximize the log- 
likelihood function associated to the Wishart distribution, which is given by 

N 

^(0)=3iVLlogL+(L-3)^log|Zfc|-LiVlog|Z| 

^=1 ( 4 ) 

2 

- 3N\ogji - W ^ logTCL - i) - AiLtr(Z"i Z), 
i=0 

where Z - N~^ ^k=i ^ ~ ^ ~ vec(Z), and vec(-) is the column stacking 

vectorization operator. Frery et al. showed that Zml is given by the sample mean 

Zml = Z, (5) 

and that Lml satisfies the following non-linear equation 

_ 1 w 

31ogLML+— ^log|Zfc|-log|Z|--i//™(LML) = 0, (6) 

k=l 

where y/f^ (•) is the zeroth order term of the yth-order multivariate polygamma func¬ 
tion given by 

where (-) is the ordinary polygamma function expressed by 

^-Mogni) 

for y > 0, and (■) is the digamma function. 

Nascimento et al. (H noted that Lml is biased, and proposed a number of cor¬ 
rection techniques obtained subtracting an estimator of the bias from the ML es¬ 
timator. Using their results, we employ the Box-Snell estimator of the bias, given 
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by 
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B(i) = 


2NL 




3 1 
l\ 


2N[i//<i>a)-f 


(7) 


We first compute Lml by solving i), then we evaluate B(Lml) with I?) and, finally, 
we use the first-order bias-corrected estimator L - Lml “ 


2.3 Stochastic distances between Wishart laws 

Frery et al. (19), while seeking for contrast measures between samples, derived ex¬ 
pressions for the stochastic distances between pairs of Wishart laws with (possi¬ 
bly) different covariance matrices Zi and Z 2 , and same number of looks L. They 
obtained the Kullback-Leibler, Hellinger, Bhattacharyya and Renyi (of order p) dis¬ 
tances. 

The two first are of interest in this work, and they are given, respectively, by 


<^Kl(21i,22) 

and 


rtr(Z7iZ2-tZ-iZi) 

M 2 




2VI2:iiiZ2i 


L 


( 8 ) 


(9) 


Notice that both distances depend on two very simple operations: the determinant 
and the inverse. 

The two other distances, namely Renyi and Bhattacharyya, were not used be¬ 
cause the former depends on an extra parameter {p), and the latter is a simple in¬ 
creasing transformation of the Hellinger distance, namely = -log(l - d^}, so it 
adds little to this study. 


3 Complex Wishart matrices classification using weighted 
stocastic distances 

In this paper we address a multiclass classification problem in .5/, the space (or cone) 
of the of Hermitian positive definite matrices. All elements in are assumed to fol¬ 
low a scaled complex Wishart distribution, as defined by the density given in Eq. . 

We assume there are M classes and, for each class m e {1,...,M}, we denote by 
Zm its prototype, given by a random matrix in that follows the scaled complex 
Wishart distribution TV For each point of a PolSAR image we have a ran¬ 
dom matrix X which is assumed to belong to one of the predefined classes. 

A basic classification procedure is to assign each Xes^ to the class m for which 
X is closest to Zm, in a stochastic distance sense. This pointwise procedure is prone 
to errors due to the variability of the observations. In order to improve this basic 
classification procedure, we introduce a collection of weights {Wm} such that X is 
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associated to the class rriannW £ satisfying 


rnminiX) = argmin Wmd{X,Zm), 

m 


(10) 


i.e., among the M possible classes, is the class whose prototype is clos¬ 

est to X with respect to a distance d weighted by a factor that depends on the class. 
If Wm = oo for any class, that class is forbidden; if Wm = 0, the classification into m 
is mandatory. In the following subsection we will propose a procedure to estimate 
{Wm] in order to maximize the discrimination power of the classes. 

These multiplicative factors have the effect of modifying the number of looks 
Lm of each class either direcdy, as in the Kullback-Leibler distance, Eq. (^, or as a 
first order approximation in the Hellinger distance, Eq. 0- Their purpose is to exert 
control on the relative importance each class has on the procedure. 

Each of the M prototypes is built by selecting a ROI (region of interest) that will 
serve as ground truth or reference. Each ROI is split in two disjoint sets of equal size 
by simple random sampling without replacement. The first set is used as training 
data, while the second is employed as test data. For each class, the training samples 
are used to estimate (Zm, Lm) by improved maximum likelihood, respectively, as de¬ 
scribed in Section i i.2 Denote by Mm the number of pixels in the training set of 


class m, 1 ^ rn < M, and its pixels values by Zm, 1 < fc < Mm- 

The weights Wm are computed such that they maximize the discrimination power 
of the stochastic distance. This is done by using an energy optimization strategy, 
with the aid of the following energy function 


M 


1 


Myy. 


(p{^md{Zm,Zm) U>m’ d{Zmt Zm']), 


m=\ k=lm'^m 


( 11 ) 


where 


and A > 0 is a parameter. In order to simplify the optimization problem we assume 
that Y.m - 1- Roughly speaking, by minimizing Eq. m we make the observa¬ 
tions Zm be as close as possible to Zm, and as far as possible to any other class 
representative Zm’ ■ The minimum of the above energy function is found using a 
gradient descent type algorithm. In all the experiments showed in this paper, Wm is 
initialized as 1 / M and A is fixed to 1. 

4 Diffusion-reaction differential system 

This section follows the ideas introduced in Ref. [I5l, but using weighted stochastic 
distances instead of the Euclidean distance. In order to introduce a smoothing pro¬ 
cedure in the classification step, we embed the classification problem in a Diffusion- 
Reaction differential system. 

We denote by 2(0, x, y) the initial covariance matrix provided by the PolSAR im¬ 
age, and by 2 (f, x, y) its evolution under the action of the diffusion-reaction system. 
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Let Z{t,x, y) be the family of Wishart random matrices associated to 5;(f, x, y). The 
diffusion-reaction system we propose is given by 

dT , '1 c 1 

— ^aAl + [mmWmd[Z,Zm)- mm Wmd[Z,Zm)][I.-'^mmmi.z)], (12) 

dt m m^mnun(Z) 

where AZ represents the spatial Laplacian differential operator applied on each com¬ 
ponent of the matrix Z(t, x, y), {Z^l represents distributions that describe the classes 
present in the PolSAR image, and a > 0 is a weight parameter which balances the in¬ 
fluence of the diffusion and reaction terms, and (^) obtained using eq. (To). 
The diffusion term, given hy the Laplacian operator tends to smooth the covari¬ 
ance matrix Z (t, x, y), and the reaction term, given by the remainder of the equation, 
tends to move Z(f, x, y) towards the nearest covariance matrix Z^ according to the 
weighted stochastic distance Wmd{Z{t, x, y), Zm)- 


4.1 Diffusion-reaction system discretization 

To discretize equation (12) we denote by Z"^. the approximation Z{ndt, ih,jh), and 
hy Z"^. its associated covariance matrix, where i,j e N, and 8t and h are the dis¬ 
cretization values. We propose the following two-step algorithm where, in the first 
step, we discretize the contribution of the diffusion part of the equation and, then, in 
the second step, we discretize the contribution of the reaction part of the equation 
taking into account that, locally, using a linear approximation of the reaction term, 
the solution of the equation is given by an exponential function; 


Step 1: 


Z”^ = Z"^ + adr 


yn +yn + y« + y” -4y” 


Step 2: 




SnmmWrnd^Zf'Zm)- min Wmd{Zf'Zm)\ 




where Zf'. represents the covariance evolution using only the diffusion operator, 
and ZPI its associated random matrix. Notice that the first step incorporates the 
contextual evidence. 

We observe that if 1 - 4aSt/h^ > 0, then Zf'. is a convex combination of Zf ., 

‘•7 ^>7 


Zf , ., Zf_, ., Zf . ,, Zf E sd, and, in particular, Zf'. e sd. Moreover since Zft^ is 
i+ 1,7 I i»7 ^>7"*”-*^ m7 ^ ^’7 ^’7 

a convex combination of Zm„,j,,(z") and Z”^., then e sd. 

That is, if the original covariance matrices j}i provided by the original 


PolSAR image belong to sd, then , , ^'^sd for all n > 0 and, therefore, the as¬ 
sociated random matrices Z". are well defined. Notice that the above discretization 

lyj 


scheme depends on three parameters: a, 81 and h. However, looking at the shape 
of the scheme, we point out that it depends just on the values a 81 ! (Step 1), and 
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on 5? (Step 2); therefore, without loss of generality, we fix h = 1. The only param¬ 
eters that have influence on the results are a and 61 . As explained above, in order 
to have a well-defined procedure with solutions in si, these parameters must satisfy 
the condition \ -Aad t>0;6t represents the time discretization step in the evolution 
equation. 

The smaller 61 is, the better becomes the approximation of the above scheme to 
the actual continuous solution of the differential system given by eq. (H- In prac¬ 
tice, as far as 51 is small enough we do not expect a strong influence of d f in the 
results. In practice, the most important parameter is a. The larger a is, the stronger 
the smoothing effect of the differential system given in eq. (T2) becomes. 

In all the experiments presented in this paper, the discretization parameters 
have been fixed to dr = 0.01 and a = 0.5. Different choices and combinations are 
possible, and a detailed comparison study will be performed. 


5 Experimental Setup 

We compare the performance of the proposed methodology by classifying both sim¬ 
ulated and real PolSAR images with respect to the following techniques 

1. Maximum likelihood (ML) under the Wishart model: each observation is as¬ 
signed to the class whose density, as expressed in Eq. (^, is maximized. 

2. Euclidean distance (ED): each observation is classified into the class which 
minimizes the Euclidean distance between the observation and its prototype. 

3. Hellinger distance (HD): each observation receives the label of that class which 
minimizes this distance to its prototype; cf. Eq. (9) . 

4. Kullback-Leibler distance (KL): same as above, but using Eq. 1^. 

5. KL distance and optimized weights (KL-i-OW): same as in|^ but using the op¬ 
timized weights computed minimizing the energy functional (TT). 

6. Diffusion reaction equation with KL-i-OW (DR-i-KL-i-OW-i- n): The reaction diffu¬ 
sion equation is solved using n iterations of the iterative discretization scheme, 
then each resulting observation is classified using KL+OW. 

The visualization of PolSAR images requires stipulating a projection of the el¬ 
ements of sd into the unitary cube, then mapping these three components to the 
red, green and blue components of a colour image (the RGB representation). Lee 
and Pottier (S) discuss a number of such projections. We adopt here a simple visu¬ 
alization technique that emphasizes the quality of the classifications obtained. We 
assign a unique colour to each class representative 2^, then, we assign a colour to 
any e sd using a linear interpolation procedure based on the Euclidean distance 
bewteen 2j j- and 2^- The smaller the Euclidean distance between 2i j- and 2m, the 
larger the weight of the colour component of class m is in the interpolation proce¬ 
dure. 
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5.1 Simulated Image 

In the first experiment we use a 300 x 300 pixels phantom image with three classes 
generated using three Wishart distributions. The parameters used for the simula¬ 
tion are those estimated in the training samples used in the experiment discussed 
in the next section. These training samples come from the ocean, forest and urban 
classes identified in Fig|^ Each deviate from a Wishart law is obtained by simulating 
entries of the complex matrix entries, which obey complex Gaussian observations, 
then forming the sample covariance matrix and adding as many of them as the (inte¬ 
ger) number of looks. This requires stipulating the number of looks, that in our case 
was L = 4 for all classes, and the variance and covariances of the these entries [17] . 

Table presents the overall accuracy of classifications produced by the tech¬ 
niques here considered. The optimized weights, obtained through the minimization 
of the energy function presented in eq. {m, were w - (0.23,0.50,0.28). In all cases, 
the equivalent number of looks was informed L = 4 and fixed for the three classes. 
For each class, the technique which provides the worst classification is used as base¬ 
line for computing the improvement each technique provides. These improvements 
are informed as percentages between parentheses. 


Method 


Accuracy (Improvement) by Class 

CPU 

time 


Class 1 (Ocean) Class 2 (Forest) 

Class 3 (Urban) 

(s) 

ML 

100 

98.5 (77.6%) 

96.6 (87.3%) 

0.170 

ED 

100 

93.3 (baseline) 

83.1 (36.7%) 

0.005 

HD 

100 

99.9 (98.5%) 

82.8 (35.6%) 

0.200 

KL 

100 

99.7 (95.5%) 

73.3 (baseline) 

0.020 

KT-hOW 

100 

96.6 (49.3%) 

93.1 (74.2%) 

1.210 

DR-i-KL-fOW-i-50 

100 

99.7 (95.5%) 

100 (100%) 

3.630 


Table 1: Phantom classification scores using the test data. 

All techniques classified all points of Class 1 properly. The worst classifications 
for classes 2 and 3 were produced, respectively, by the Euclidean and by the Kullback- 
Leibler distances. Class 3 presents the largest variability, hence improvement, in the 
results. The Hellinger distance is the second best technique with a marginal im¬ 
provement of 35.6% over the baseline. The Euclidean distance improves 36.7%. In¬ 
troducing optimized weights improves the results of using the KL distance by 74.2%. 
Nevertheless, these classification procedures are outperformed by the maximum 
likelihood rule (ML), which is 87.3% better than the baseline. Albeit remarkably 
good, this last result is overshadowed by our proposal that attains 100% of accuracy, 
thus improving the baseline 100%. 

Fig-0 shows these results. We include two images that show the evolution of 
the solution of the diffusion-reaction equation for n = 25 and n = 50 iterations. Ta- 
ble0also shows the CPU time (in seconds) for each classification method. We use a 
simple C-i-i- implementation in an Intel Core i5 (2.50 GHz) architecture without par- 
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allelization or other optimization techniques. We notice that, even with this basic 
implementation, the methods are quite fast and in particular the proposed method 
(DR+KL+OW+50) takes just 3.63 seconds. 



Figure 1: Results of different classification techniques: (a) original image, (h) ML 
classification, (c) ED classification, (d) HD classification, (e) KL classification, 
(f) KL+OW classification, (g,h) evolution of the solution of the diffusion-reaction sys¬ 
tem for 25 and 50 iterations respectively, and (i) DR-i-KL-rOW+50 classification. 


5.2 Real Image 

In the second experiment we use a PolSAR image obtained by the AIRSAR sensor 
over the San Francisco bayj^ The image size is 600 x 448 pixels. Four classes are 
readily identifiable: Ocean, Grass, Park and Urban areas. We selected a ROI for each 
class, whose observations were used to estimate the Wishart parameters according 

^Details about these freely available data can be accessed at ESA’s web page https: //earth.esa. 
int/web/polsarpro/data-sources/sample-datasets 
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to the results presented in Section [^p.2| these ROIs are shown in Fig.|^ These data 
are also used as the reference to estimate the accuracy of the different classification 
strategies. 



Figure 2: (a) San Francisco bay PolSAR image with ROIs, (b) ML classification, (c) ED 
classification, and (d) FID classification. 

Table [^presents the overall accuracy of classifying the four classes observed in 
the San Francisco image by the techniques here considered. With the exception of 
the Ocean class, that is almost perfectly classified by all methods, the worst result is 
used a baseline. Class-wide improvements with respect to the baseline are reported 
in parentheses. 

The results for the real image are consistent with the ones obtained for the simu¬ 
lated data: Euclidean and KL distances provide the worst results. Flellinger distance 
classification is the next best technique, except in the Urban area where the Eu¬ 
clidean distance is slighty better. ML clearly outperforms Euclidean, BCL and Flellinger 
distances, and even the weighted KL distance except in the case of Urban areas 
where the weighted KL distance is considerably better (53.40%) than ML (39.18%). 
The optimized weights for this problem were w = (0.44,0.38,0.12,0.06). As expected, 
the Urban area presents a high variability which is consistent with the small weight 
value (0.06) obtained for this area in the optimization procedure. 

The proposed technique using the diffusion-reaction equation consistently im¬ 
proves in a significant way the classification accuracy (90.4%, 82.2% and 77.8% for 
the Grass, Park and Urban areas, respectively), and when a perfect classification is 
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Figure 3: (a) BCL classification, (b) KL+OW classification, (c) evolution of the solution 
of the diffusion-reaction system after 50 iterations, and (d) DR-i-KL+OW-t50 classifi¬ 
cation. 


obtained, as in Ocean, the iterative procedure does not make it worse. These im¬ 
provements are not incremental with respect to both the baseline and to the use of 
optimized weights, and render classified images that can be considered excellent. 
In Fig.|^ andj^we illustrate the results of the different classification techniques for 
the San Francisco bay image. In table we also show the CPU time for the differ¬ 
ent classification methods. The results are consistent with the ones obtained for the 
phantom image taking into account the size of the images. In particular the pro¬ 
posed method (DR-i-KL-tOW-i-50) takes just 7.67 seconds. 

Fig.j^shows the evolution of the diffusion-reaction system described by Eq. (12) 
observed in the classification of the San Francisco bay PolSAR image. The ordinates 
of the two sequences are shown in logarithmic scale. As explained above, the reac¬ 
tion term of the diffusion-reaction system tends to move each PolSAR image pixel 
value towards its nearest class representative SmminCX, pi in particular we expect 
that the weighted distance average of Z; to p goes to zero when the data 

evolves according to the diffusion-reaction system. This behaviour is illustrated in 
the continuous gray curve. Initially such distance average is equal to, approximately, 
4.38; we observe that it goes very quickly, faster than exponentially, to zero. In a 
complementary fashion, the diffusion term tends to smooth the solution and, as a 
consequence, it makes some pixels change their classification. The circles show the 


14 





Method 


Accuracy (Improvement) by Class 

CPU 

time 


Ocean 

Grass 

Park 

Urban 

(s) 

ML 

99.8 

94.9 (82.5%) 

86.1 (60.1%) 

57.4 (39.3%) 

0.64 

ED 

99.8 

70.9 (haseline) 

65.2 (haseline) 

38.5 (12.4%) 

0.02 

HD 

100.00 

93.7 (78.4%) 

68.7(10.1%) 

34.8 (7.1%) 

0.81 

KL 

100.00 

89.4 (63.6%) 

65.4 (0.6%) 

29.8 (baseline) 

0.09 

KL-tOW 

100.00 

91.7 (71.5%) 

79.9 (42.2%) 

66.2 (51.9%) 

0.46 

DR-i-KL-tOW+50 

100.00 

97.2 (90.4%) 

93.8 (82.2%) 

84.4 (77.8%) 

7.67 


Table 2: San Francisco bay classification scores using the test data. 


evolution of the percentage of pixels which change their associated class in each 
iteration (that is, the value mminfXij) is altered). We observe that 1.9% of pixels 
change their associated class in the first iteration, and that this rate decreases across 
the iterations also faster than exponentially. 


6 Conclusions 

In this paper we address the problem of PolSAR image classification. The main con¬ 
tributions of the paper are, on the one hand to introduce a stochastic weighted dis¬ 
tance strategy to increase the class discrimination power and on the other hand to 
embed the classification in a diffusion-reaction differential system which aims to 
include smoothing constraints in the classification procedure. The weights used to 
improve the class discrimination power of the distance are estimated by minimizing 
a new energy functional. These weights take into account the PolSAR matrix vari¬ 
ability in the class regions (the lower the weight, the larger the expected variability 
inside the class). 

We show that introducing these weights is equivalent to modifying the number 
of looks of the Wishart model for each class and, as discussed by Torres et al. |2T| , this 
takes account of the variability due to texture that is not embedded in the Wishart 
model. We present experiments with simulated and real PolSAR images. These ex¬ 
periments confirm that the introduction of the weighted distances improves con¬ 
siderably the classification results. 

By embedding the classification problem in a diffusion-reaction differential sys¬ 
tem, and using the weighted distances, the classification obtained with simulated 
and real data outperforms, in a significant way, the classification score obtained with 
the usual classification strategies including maximum likelihood, and Euclidean, 
Hellinger and Kullback-Leibler distances. 

Further comparisons, including computational costs, will be made with respect 
to other contextual techniques as, for instance, the simple mode and the ICM al¬ 
gorithm 13. Also, techniques for estimating optimal values of the parameters that 
govern the process (a and 61 ) will be sought. 
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Observables along Iterations 



0 50 100 150 200 250 300 


Iteration 


Figure 4: Evolution of the diffusion-reaction system: KL weighted distance average 
of Zj to its nearest class representative j) (continuous gray curve), and per¬ 
centage of pixels which change their class (empty circles) in each itera¬ 

tion, semilogarithmic scale for the ordinates. 
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